home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Ham Radio 2000
/
Ham Radio 2000.iso
/
ham2000
/
satellit
/
pacdoc
/
eltlog.asc
< prev
next >
Wrap
Text File
|
1991-04-12
|
6KB
|
123 lines
The files "eltlog" and "eltlogxx" which appear on PACSAT, UoSAT-OSCAR-14
and LUSAT are logs of memory errors caused by cosmic radiation. The source
code accompaning this documentation is for a program which decodes and
displays the contents of an eltlog. The .exe file elogdisp.exe has been
compiled for MS-DOS computers.
The following extract from the UoSAT/Microsat file system documentation
describes the process by which SEUs are discovered and logged in the
"eltlog" files.
Programmers:
An eltlog file is simply a collection of the MEMERR structures
described below. Multi-byte values are stored INTEL style, least
significant byte first. "int" is 2 bytes, "long" is 4, "char" is 1.
If you write an elog display program for a non-IBM computer, please pass
it along by uploading it to one of the PACSATs.
JWW
P.S. The structure described here has been in use on AO-16 and LO-19
since March, and on UO-14 since 4th April. Older eltlog files use a
slightly different format described in earlier documentation. For the
statistically minded, the UO-14 eltlogs beginning with 10 April will
accurately reflect the SEU rate in 4 X 1024 X 1020 X 8 bits of static
RAM. Accurate statistics from AO-16 and LO-19 will be available after
those satellites are reloaded.
---------------------------
5.5 SEU Correction and Monitoring
Cosmic radiation passing through semiconductor RAMs can cause stored bits to
change. The RAMDISK is protected against such single-event upsets (SEUs) by an
error-detection and correction (EDAC) code implemented in software. SEUs will
be corrected whenever data is read from the disk. If data were left on the
disk for several days without being read, SEUs might accumulate and defeat the
error correction capabilities of the EDAC code. To prevent this, all data in
the file system must be regularly read to detect and correct SEUs. This is
called a "disk wash", and it is implemented as part of the file system server
task. The entire disk - FAT, directory, files and free space - is washed
regularly.
The disk wash rate is controlled by the global variable called g_washperiod,
which is the number of seconds after which one cluster will be washed. The
default is 4 seconds. Every g_washperiod seconds, one cluster is selected and
read; if any error is detected, the cluster is corrected and re-written, and
an error log request is sent on the error logging stream.
5.5.1 SEU Logging
When errors are detected, a MEMERR structure describing the error is sent on
the stream '_seus' to the station 'seulog'. The 'seulog' station must be
provided by a task other than the file system server. The function do_elt() in
the module elt.c is an error logging server which provides the 'seulog' sta-
tion and processes log requests. This function should be called by the host
task whenever the host task is posted. do_elt() appends the MEMERR structures
to a file called 'eltlog'. 'eltlog' has PFH file type 0x0b. There are 4 global
variables which control the operation of do_elt():
gdoelt if 0 stop logging errors;
gmaxelterr maximum number of errors to put in a log file;
gcurelterr curent number of errors in the log file;
gtotelterr total number of errors logged since startup.
The file 'eltlog' can be downloaded or deleted at any time. If the file is
deleted, a new one is created when the next MEMERR structure arrives. The
<gcurelterr> variable is reset to 0 when a new elt log file is opened.
The current definition of the MEMERR structure is :
struct MEMERR {
long time; /* time of correction */
unsigned int cluster; /* cluster of error */
int sector; /* sector w/in cluster */
int byte; /* byte w/in sector */
unsigned long fnumber; /* File in which error was */
int severity;
char pattern; /* pattern for type 1 */
int function; /* reason for check */
};
There are NO slack bytes in the structure.
Much like a DOS disk, the RAMDISK is composed of small sectors which are
grouped into larger clusters; a cluster is the smallest unit of disk space
which can be allocated to a file. MEMERR.sector tells which 255-byte sector
the SEU was in; MEMERR.cluster tells what 1020-byte cluster the sector was
part of; and MEMERR.byte tells which byte in the sector contained the SEU.
The software EDAC code used to protect the RAMDISK can correct any error which
is confined to one byte. Even if the byte is completely destroyed, it will be
restored to its original condition. These correctable errors will have severi-
ty code 1 in the MEMERR.severity variable. Errors which span more than one
byte, but are still detected by the EDAC code will be given severity 2. Such
errors are NOT corrected.
If the error was correctable, the MEMERR.pattern byte shows which bits were
effected by the SEU. Generally, only one '1' bit will appear in MEMERR.pat-
tern. MEMERR.pattern has no meaning if MEMERR.severity is not equal to 1.
The MEMERR.function indicates whether the error was detected during an SEU
wash or during a user-initiated data read. The value 1 for washes, and 2 for
user reads.
MEMERR.fnumber indicates which file contained the error. This is only valid if
the error was detected during file washing, in which case MEMERR.function will
be 1. If MEMERR.function is 1, then MEMERR.fnumber tells what file or part of
the disk contained the SEU. If MEMERR.function is not equal to 1, the error
was detected during a normal disk read, and MEMERR.fnumber has no meaning.
function fnumber meaning
-------- ---------- ----------------------------------------
1 0xfffffffe error detected in unallocated cluster
1 0xffffffff error detected in FAT by wash
1 0x00000000 error detected in directory by wash
1 n error detected in file <n> by wash
2 x error detected during read, file unknown
MEMERR.time tells when the error was corrected; as usual, it indicates the
number of seconds after 00:00:00 UTC January 1, 1970.